Answer the following questions:
In this activity, you will:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapter 7. John Wiley & Sons: New Jersey.
For this activity you will need the following:
An R markdown notebook version of this document (the source file).
A package called geog4ga3.
It is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity.
In addition to tidyverse, you will need sf, a package that implements simple features in R (you can learn about sf here) and spdep, a package that implements several spatial statistical methods (you can learn more about it here):
library(tidyverse)
library(sf)
library(spdep)
library(geog4ga3)
Begin by loading the data that you will use in this activity:
data(Hamilton_CT)
This is a sf object with census tracts and selected demographic variables for the Hamilton CMA in Canada. You can obtain new (calculated) variables as follows. For instance, to obtain the proportion of residents who are between 20 and 34 years old, and between 35 and 49:
Hamilton_CT <- mutate(Hamilton_CT, Prop20to34 = (AGE_20_TO_24 + AGE_25_TO_29 + AGE_30_TO_34)/POPULATION, Prop35to49 = (AGE_35_TO_39 + AGE_40_TO_44 + AGE_45_TO_49)/POPULATION)
You can also convert the sf object into a SpatialPolygonsDataFrame object for use with the spdedp package:
Hamilton_CT.sp <- as(Hamilton_CT, "Spatial")
This function is used to create local Moran maps:
localmoran.map <- function(p = p, listw = listw, VAR = VAR, by = by){
require(tidyverse)
require(spdep)
require(plotly)
df_msc <- transmute(p,
key = p[[by]],
Z = (p[[VAR]] - mean(p[[VAR]])) / var(p[[VAR]]),
SMA = lag.listw(listw, Z),
Type = factor(ifelse(Z < 0 & SMA < 0, "LL",
ifelse(Z > 0 & SMA > 0, "HH", "HL/LH"))))
local_I <- localmoran(p[[VAR]], listw)
df_msc <- left_join(df_msc,
data.frame(key = p[[by]], local_I))
df_msc <- rename(df_msc, p.val = Pr.z...0.)
plot_ly(df_msc) %>%
add_sf(split = ~(p.val < 0.05), color = ~Type, colors = c("red", "khaki1", "dodgerblue", "dodgerblue4"))
}
This function is used to create \(G_i^*\) maps:
gistar.map <- function(p = p, listw = listw, VAR = VAR, by = by){
require(tidyverse)
require(spdep)
require(sf)
require(plotly)
p <- mutate(p, key = p[[by]])
df.lg <- localG(p[[VAR]], listw)
df.lg <- as.numeric(df.lg)
df.lg <- data.frame(Gstar = df.lg, p.val = 2 * pnorm(abs(df.lg), lower.tail = FALSE))
df.lg <- mutate(df.lg,
Type = factor(ifelse(Gstar < 0 & p.val <= 0.05, "Low Concentration",
ifelse(Gstar > 0 & p.val <= 0.05, "High Concentration", "Not Signicant"))))
p <- left_join(p,
data.frame(key = p[[by]], df.lg))
plot_ly(p) %>%
add_sf(split = ~(p.val < 0.05), color = ~Type, colors = c("red", "dodgerblue", "gray"))
}
Create spatial weights.
Hamilton_CT.w <- nb2listw(poly2nb(pl = Hamilton_CT.sp))
Hamilton_CT.3knb <- Hamilton_CT.sp %>% coordinates() %>% dnearneigh(d1 = 0, d2 = 3)
Hamilton_CT.3kw <- nb2listw(include.self(Hamilton_CT.3knb), style = "B")
You are now ready for the next activity.
Create local Moran maps for the population and proportion of population in the age group 20-34. What is the difference between using population (absolute) and proportion of population (rate)? Is there a reason to prefer either variable in analysis? Discuss.
Use the \(G_i^*\) statitic to analyze the population and proportion of population in the age group 20-34. What is the difference between using population (absolute) and proportion of population (rate)? Is there a reason to prefer either variable in analysis? Discuss.
gistar.map(Hamilton_CT, Hamilton_CT.3kw, "POP_DENSITY", by = "TRACT")
Joining, by = "key"
Column `key` joining character vector and factor, coercing into character vectorNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
gistar.map(Hamilton_CT, Hamilton_CT.3kw, "POPULATION", by = "TRACT")
Joining, by = "key"
Column `key` joining character vector and factor, coercing into character vectorNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
ggplot(Hamilton_CT, aes(x = AREA, y = POPULATION)) + geom_point()
pop <- mutate(Hamilton_CT, x = (AGE_20_TO_24 + AGE_25_TO_29 + AGE_30_TO_34))
proppop <- mutate(Hamilton_CT, x = (AGE_20_TO_24 + AGE_25_TO_29 + AGE_30_TO_34)/POPULATION)
pop.sp <- as(Hamilton_CT, "Spatial")
proppop.sp <- as(Hamilton_CT, "Spatial")
pop.w <- nb2listw(poly2nb(pl = pop.sp))
proppop.w <- nb2listw(poly2nb(pl = proppop.sp))
pop.3knb <- pop.sp %>% coordinates() %>% dnearneigh(d1 = 0, d2 = 3)
pop.3kw <- nb2listw(include.self(pop.3knb), style = "B")
proppop.3knb <- proppop.sp %>% coordinates() %>% dnearneigh(d1 = 0, d2 = 3)
proppop.3kw <- nb2listw(include.self(proppop.3knb), style = "B")
df1.lg <- localG(pop$x, pop.3kw)
df2.lg <- localG(proppop$x, proppop.3kw)
summary(df1.lg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.7967 -0.6468 -0.1881 0.0000 0.4402 5.2785
summary(df2.lg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.5782 -0.5473 -0.1022 0.0000 0.3814 3.7182
#Calcualting the p-value
df1.lg <- as.numeric(df1.lg)
df1.lg <- data.frame(Gstar = df1.lg, p.val = 2 * pnorm(abs(df1.lg), lower.tail = FALSE))
df2.lg <- as.numeric(df2.lg)
df2.lg <- data.frame(Gstar = df1.lg, p.val = 2 * pnorm(abs(df2.lg), lower.tail = FALSE))
#Join to sf:
join <- Hamilton_CT
join$Gstar <- df1.lg$Gstar
join$p.val <- df1.lg$p.val
join2 <- Hamilton_CT
join2$Gstar <- df2.lg$Gstar
join2$p.val <- df2.lg$p.val
joinplot <- mutate(join,
Type = factor(ifelse(Gstar < 0 & p.val <= 0.05, "Low Concentration",
ifelse(Gstar > 0 & p.val <= 0.05, "High Concentration", "Not Significant"))))
joinplot2 <- mutate(join2,
Type = factor(ifelse(Gstar < 0 & p.val <= 0.05, "Low Concentration",
ifelse(Gstar > 0 & p.val <= 0.05, "High Concentration", "Not Significant"))))
Error in mutate_impl(.data, dots) :
Evaluation error: object 'Gstar' not found.
Now create local Moran maps for the population and population density in the age group 20-34. What is the difference between using population (absolute) and population density (rate)?
More generally, what do you think should guide the decision of whether to analyze variables as absolute values or rates?